ronantakizawa (Ronan Takizawa)

reacted to Holy-fox's post with 🤗 about 4 hours ago

Post

1432

4月？ごろに参加したCerebrasのハッカソンから何故かHuggingfaceのproプランが続いてるんですよね...

多分ハッカソン期間だけのはずなんだけど、外れないのよね。
まあ、クレカとかは登録してないから大丈夫だとは思うけど

3 replies

·

replied to Holy-fox's post about 5 hours ago

同じく自分もハッカソンに一度参加してHuggingfaceのproプランが続いています！6ヶ月後にProプランが終了しました。

posted an update 1 day ago

Post

174

Reached 2500+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa

reacted to their post with 🔥 4 days ago

Post

2417

Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending

posted an update 4 days ago

Post

264

Introducing the india-trending-words dataset: a compilation of 900 trending Google searches from 2006-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 80 categories, and is perfect for analyzing cultural shifts and predicting future trends in India.

#india #indiadataset #googlesearches

ronantakizawa/india-trending-words

posted an update 6 days ago

Post

2417

Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending

reacted to their post with 👍 8 days ago

Post

988

Introducing the google-trending-words dataset: a compilation of 2784 trending Google searches from 2001-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 93 categories, and is perfect for analyzing cultural shifts, predicting future trends, and understanding how global events shape online behavior.

#trends #google #googlesearches

ronantakizawa/trending-words-google

posted an update 11 days ago

Post

988

Introducing the google-trending-words dataset: a compilation of 2784 trending Google searches from 2001-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 93 categories, and is perfect for analyzing cultural shifts, predicting future trends, and understanding how global events shape online behavior.

#trends #google #googlesearches

ronantakizawa/trending-words-google

reacted to their post with 👍 12 days ago

Post

1618

Introducing the Japanese Character Difficulty Dataset: a collection of 3,003 Japanese characters (Kanji) labeled with official educational difficulty grades. It includes elementary (grades 1–6), secondary (grade 8), and advanced (grade 9) characters, making it useful for language learning, text difficulty analysis, and educational tool development 🎉

ronantakizawa/japanese-character-difficulty

#japanese #kanji #japanesedataset

posted an update 13 days ago

Post

1618

Introducing the Japanese Character Difficulty Dataset: a collection of 3,003 Japanese characters (Kanji) labeled with official educational difficulty grades. It includes elementary (grades 1–6), secondary (grade 8), and advanced (grade 9) characters, making it useful for language learning, text difficulty analysis, and educational tool development 🎉

ronantakizawa/japanese-character-difficulty

#japanese #kanji #japanesedataset

reacted to their post with 👍 14 days ago

Post

3285

Reached 1000+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa

2 replies

·

reacted to their post with 🔥 14 days ago

Post

2283

I built a demo on how to implement Cache-Augmented Generation (CAG) in an LLM and compare its performance gains to RAG (111 stars, 20 forks).

https://github.com/ronantakizawa/cacheaugmentedgeneration

CAG preloads document content into an LLM’s context as a precomputed key-value (KV) cache. This caching eliminates the need for real-time retrieval during inference, reducing token usage by up to 76% while maintaining answer quality.

CAG is particularly effective for constrained knowledge bases like internal documentation, FAQs, and customer support systems, where all relevant information can fit within the model's extended context window.

#rag #retrievalaugmentedgeneration

posted an update 16 days ago

Post

2283

I built a demo on how to implement Cache-Augmented Generation (CAG) in an LLM and compare its performance gains to RAG (111 stars, 20 forks).

https://github.com/ronantakizawa/cacheaugmentedgeneration

CAG preloads document content into an LLM’s context as a precomputed key-value (KV) cache. This caching eliminates the need for real-time retrieval during inference, reducing token usage by up to 76% while maintaining answer quality.

CAG is particularly effective for constrained knowledge bases like internal documentation, FAQs, and customer support systems, where all relevant information can fit within the model's extended context window.

#rag #retrievalaugmentedgeneration

reacted to their post with 🔥 17 days ago

Post

3285

Reached 1000+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa

2 replies

·

replied to their post 17 days ago

Thanks!

posted an update 17 days ago

Post

3285

Reached 1000+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa

2 replies

·

reacted to their post with 🔥 18 days ago

Post

2969

Introducing the Japanese honorifics dataset: a dataset with 137 sentences covering the three main keigo forms: 尊敬語 (Sonkeigo), 謙譲語 (Kenjōgo), and 丁寧語 (Teineigo). Each entry includes the base form, all three honorific transformations, and English translations for essential phrases in Japanese. This dataset is perfect for training and evaluating the Japanese skill level of LLMs.

#japanese #japanesedataset

ronantakizawa/japanese-honorifics

posted an update 18 days ago

Post

2969

Introducing the Japanese honorifics dataset: a dataset with 137 sentences covering the three main keigo forms: 尊敬語 (Sonkeigo), 謙譲語 (Kenjōgo), and 丁寧語 (Teineigo). Each entry includes the base form, all three honorific transformations, and English translations for essential phrases in Japanese. This dataset is perfect for training and evaluating the Japanese skill level of LLMs.

#japanese #japanesedataset

ronantakizawa/japanese-honorifics

reacted to their post with 👍 24 days ago

Post

1108

Introducing JFLEG-JA, a new Japanese language error correction benchmark with 1,335 sentences, each paired with 4 high-quality human corrections 🎉

Inspired by the English JFLEG dataset, this dataset covers diverse error types, including particle mistakes, kanji mix-ups, incorrect contextual verb, adjective, and literary technique usage.

You can use this for evaluating LLMs, few-shot learning, error analysis, or fine-tuning correction systems.

ronantakizawa/jfleg-japanese

#japanese #evals #benchmark

posted an update 25 days ago

Post

1108

Introducing JFLEG-JA, a new Japanese language error correction benchmark with 1,335 sentences, each paired with 4 high-quality human corrections 🎉

Inspired by the English JFLEG dataset, this dataset covers diverse error types, including particle mistakes, kanji mix-ups, incorrect contextual verb, adjective, and literary technique usage.

You can use this for evaluating LLMs, few-shot learning, error analysis, or fine-tuning correction systems.

ronantakizawa/jfleg-japanese

#japanese #evals #benchmark

Ronan Takizawa PRO

AI & ML interests

Recent Activity

Organizations

Ronan Takizawa PRO

AI & ML interests

Recent Activity

Organizations

ronantakizawa's activity