Model Card: BLOOM-560m for Personal Sharing Classification

This model is a fine-tuned version of BLOOM-560m designed to classify personal experience sharing in social media text. It was developed to explore how different generations (Baby Boomers and Gen X) express themselves on pseudonymous platforms like Reddit.

Model Details

Model Type: Large Language Model (Decoder-only) fine-tuned for sequence classification.
Language: English.
Finetuned from model: bigscience/bloom-560m.
Application: Sociotechnical research on digital aging and online self-disclosure.

Intended Use

Primary Task

The model classifies individual sentences into one of four categories to analyze domains of self-disclosure in online forums.

Training Data

Source: Publicly available posts and comments from the Reddit subreddit r/AskOldPeople.
Size: 2,000 manually labeled sentences (stratified sampling: 500 per category).
Data Split: 80% Training, 10% Validation, 10% Test.
Preprocessing: Sentences were tokenized using the Punkt sentence tokenizer.

Performance

The model achieved high accuracy on a held-out test set:

Metric	Value
F1 Score	0.9599

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import pipeline

classifier = pipeline("text-classification", model="ernchern/personal_info_classification")

text = "I am 67, retired in August, and most basic expenses are covered by Social Security."
result = classifier(text)
print(result)

Downloads last month: 26

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for ernchern/personal_info_classification

Base model

bigscience/bloom-560m

Finetuned

(38)

this model

Evaluation results

f1
self-reported

0.960

ernchern
/

personal_info_classification