Skip to content

Menu

Network by SubjectChannelsBlogsHomeAboutContact
AI Legal Journal logo
Subscribe
Search
Close
PublishersBlogsNetwork by SubjectChannels
Subscribe

AI Sites Relying on Law Blogs and Open Law For Legal Data

By Kevin O'Keefe on April 21, 2023
Email this postTweet this postLike this postShare this post on LinkedIn

Your AI – whether it be Google’s Bard, Open AI’s ChatGPT, or Microsoft’s Bing – is heavily relying on legal blogs and open law for its legal data.

This, per a piece this morning, from Veteran tech blogger and legal journalist, Robert Ambrogi.

Though AI is trained on large language models (LLM’s), little is known of the data, itself, on which the AI is trained.

Ambrogi reports though that The Washington Post has “lifted the cover off this black box.”

Working with the Allen Institute for AI, it analyzed Google’s C4 data set, “a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs,” including Google’s T5 and Facebook’s LLaMA.

It then categorized all of those websites (journalism, entertainment, etc.) and ranked them based on how many “tokens” appeared from each data set — with tokens being the bits of text used to process the disorganized information.”

Ambrogi found his own blog, LawSites, ranked 63,769 of all sites used to train the dataset.

Based on searches for words such as law, legal, court and case, Ambrogi found a number of prominent law legal blogs.

  • Law Professor Blogs Network, 1,655.
  • LexBlog, 110,534.
  • My Shingle, 164,557.
  • Legal Evolution, 194,595.

It’s interesting that though FindLaw, Justia and Casetext were at the top of the list, Thomson Reuters (175,911) and Bloomberg (11,209,960) were near the bottom of Ambrogi’s eighteen law sites used to train the dataset.

Tells me open law and insight and commentary on the law from established authorities may well be front and center of the law delivered by AI.

Photo of Kevin O'Keefe Kevin O'Keefe

Trial lawyer turned legal tech entrepreneur, I am the founder and CEO of LexBlog, a global community of legal bloggers which offers individuals and organizations, worldwide, professional turnkey blogging and publishing solutions.

Read more about Kevin O'KeefeKevin's Linkedin ProfileKevin's Twitter ProfileKevin's Facebook Profile
  • Posted in:
    Technology
  • Blog:
    AI in Publishing
  • Organization:
    LexBlog
  • Article: View Original Source

LexBlog logo
Copyright © 2026, LexBlog. All Rights Reserved.
Legal content Portal by LexBlog LexBlog Logo