LOSS: A Simple Text Sanitization Tool

LOSS: A Practical Text Sanitization Tool

LOSS: A Practical Text Sanitization Tool

LOSS is a command line tool written in Go that cleans and normalizes text. It is designed primarily for text produced by large language models, but it works just as well on ordinary prose.

Rather than rewriting content from scratch, LOSS focuses on removing stylistic artifacts that make text feel artificial or templated.

Contents
  • Why LOSS Exists
  • How LOSS Works
  • Command Line Usage
  • Determinism and Seeds
  • What LOSS Is Not

Why LOSS Exists

Modern language models tend to produce recognizable patterns. These include boilerplate phrases, overly balanced sentence rhythm, excessive polish, markdown formatting, and emojis. While none of these are inherently wrong, they often make text feel less human.

LOSS exists to remove those patterns while preserving meaning. It does not attempt to disguise authorship or guarantee detection avoidance. Its goal is normalization, not obfuscation.

LOSS intentionally allows small imperfections. Human writing is uneven, and perfect polish is often a red flag.

How LOSS Works

LOSS processes text through a sequence of simple, ordered stages. Each stage makes small changes and passes the result to the next stage. No single step performs aggressive rewriting.

  • Structural normalization and markdown removal
  • Punctuation normalization
  • Emoji removal
  • Sentence rhythm variation
  • Vocabulary flattening
  • Partial sentence rephrasing
  • LLM phrase suppression
  • Final cleanup and capitalization

The cumulative effect is text that feels less mechanical and more naturally written.

Command Line Usage

LOSS reads from standard input and writes to standard output. This makes it easy to integrate into scripts, pipelines, and build tools.

type input.txt | loss
  

Behavior can be tuned using flags. You can control how much vocabulary flattening or rephrasing occurs, or provide a seed for reproducible output.

type input.txt | loss --flatten-vocab=high --rephrase-ratio=0.3 --seed=42
  

Determinism and Seeds

LOSS uses randomness to avoid uniform output. This randomness affects punctuation choices, sentence rhythm, and rephrasing selection.

When a seed is provided, all randomness becomes deterministic. The same input will always produce the same output.

Use a fixed seed when debugging or when you need reproducible results in automated workflows.

Optional Local Model Support

By default, LOSS is fully rule based and does not require any language model.

Optionally, LOSS can call a local OpenAI compatible API to improve sentence rephrasing and vocabulary simplification. When enabled, all processing remains local to your environment.

What LOSS Is Not

  • LOSS is not a detection evasion tool
  • LOSS does not guarantee perfect grammar
  • LOSS does not perform full paraphrasing
  • LOSS does not alter meaning

LOSS is intentionally conservative. It cleans text, it does not reinvent it.

Conclusion

LOSS is a small, focused utility for people who want cleaner, more natural text without aggressive rewriting. It favors transparency, control, and simplicity over heavy automation.

If you want text that feels less templated and more human, LOSS is worth a look.

About the Project

LOSS is an open source project designed to be simple to understand and easy to reimplement.

Source code: https://github.com/uriel-flame-of-god/LOSS

Comments